Constructing support vector machines with missing data
نویسندگان
چکیده
Support vector machine classification (SVM) is a statistical learning method which easily accommodates large numbers of predictors and can discover both linear and non-linear relationships between the predictors and outcomes. A common challenge is constructing an SVM when the training set includes observations with missing predictor values. In this paper, we identify when missing data can bias an SVM classifier. Because the missing data mechanisms which bias SVMs differ from the traditional framework of missing-at-random and missing-not-at-random, we argue for an SVM specific framework for understanding missing data. Further, we compare a number of missing data strategies for SVMs in a simulation study and real data example, and we make recommendations for SVM users based on the simulation study. ∗Department of Biostatistics, Vanderbilt University School of Medicine †Department of Biostatistics, University of North Carolina at Chapel Hill ‡Public Health Sciences Division, Fred Hutchinson Cancer Research Center
منابع مشابه
Separating Well Log Data to Train Support Vector Machines for Lithology Prediction in a Heterogeneous Carbonate Reservoir
The prediction of lithology is necessary in all areas of petroleum engineering. This means that to design a project in any branch of petroleum engineering, the lithology must be well known. Support vector machines (SVM’s) use an analytical approach to classification based on statistical learning theory, the principles of structural risk minimization, and empirical risk minimization. In this res...
متن کاملA Comparative Study of Extreme Learning Machines and Support Vector Machines in Prediction of Sediment Transport in Open Channels
The limiting velocity in open channels to prevent long-term sedimentation is predicted in this paper using a powerful soft computing technique known as Extreme Learning Machines (ELM). The ELM is a single Layer Feed-forward Neural Network (SLFNN) with a high level of training speed. The dimensionless parameter of limiting velocity which is known as the densimetric Froude number (Fr) is predicte...
متن کاملApplication of Artificial Neural Networks and Support Vector Machines for carbonate pores size estimation from 3D seismic data
This paper proposes a method for the prediction of pore size values in hydrocarbon reservoirs using 3D seismic data. To this end, an actual carbonate oil field in the south-western part ofIranwas selected. Taking real geological conditions into account, different models of reservoir were constructed for a range of viable pore size values. Seismic surveying was performed next on these models. F...
متن کاملIdentification and Adaptive Position and Speed Control of Permanent Magnet DC Motor with Dead Zone Characteristics Based on Support Vector Machines
In this paper a new type of neural networks known as Least Squares Support Vector Machines which gained a huge fame during the recent years for identification of nonlinear systems has been used to identify DC motor with nonlinear dead zone characteristics. The identified system after linearization in each time span, in an online manner provide the model data for Model Predictive Controller of p...
متن کاملRemote Sensing and Land Use Extraction for Kernel Functions Analysis by Support Vector Machines with ASTER Multispectral Imagery
Land use is being considered as an element in determining land change studies, environmental planning and natural resource applications. The Earth’s surface Study by remote sensing has many benefits such as, continuous acquisition of data, broad regional coverage, cost effective data, map accurate data, and large archives of historical data. To study land use / cover, remote sensing as an effic...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2018